Clustering and Classifying Person Names by Origin
نویسندگان
چکیده
In natural language processing, information about a person’s geographical origin is an important feature for name entity transliteration and question answering. We propose a language-independent name origin clustering and classification framework. Provided with a small amount of bilingual name translation pairs with labeled origins, we measure origin similarities based on the perplexities of name character language and translation models. We group similar origins into clusters, then train a Bayesian classifier with different features. It achieves 84% classification accuracy with source names only, and 91% with both source and target name pairs. We apply the origin clustering and classification technique to a name transliteration task. The cluster-specific transliteration model dramatically improves the transliteration accuracy from 3.8% to 55%, reducing the transliteration character error rate from 50.3 to 13.5. Adding more unlabeled name pairs to the cluster-specific name transliteration model further improves the transliteration accuracy.
منابع مشابه
Classifying Arab Names Geographically
Different names may be popular in different countries. Hence, person names may give a clue to a person’s country of origin. Along with other features, mapping names to countries can be helpful in a variety of applications such as country tagging twitter users. This paper describes the collection of Arabic Twitter user names that are either written in Arabic or transliterated into Latin characte...
متن کاملسیستم شناسایی و طبقه بندی اسامی در متون فارسی
Name entity recognition (NER) is a system that can identify one or more kinds of names in a text and classify them into specified categories. These categories can be name of people, organizations, companies, places (country, city, street, etc.), time related to names (date and time), financial values, percentages, etc. Although during the past decade a lot of researches has been done on NER in ...
متن کاملDetermining the Origin and Structure of Person Names
This paper presents a novel system HENNA (Hybrid Person Name Analyzer) for identifying language origin and analyzing linguistic structures of person names. We conduct ME-based classification methods for the language origin identification and achieve very promising performance. We will show that word-internal character sequences provide surprisingly strong evidence for predicting the language or...
متن کاملResolving Person Names in Web People Search
Disambiguating person names in a set of documents (such as a set of web pages returned in response to a person name) is a key task for the presentation of results and the automatic profiling of experts. With largely unstructured documents and an unknown number of people with the same name the problem presents many difficulties and challenges. This chapter treats the task of person name disambig...
متن کاملEmpirical Investigation of Finding ‘Person’ in Social network using Clustering
We are trying to identify person as an entity from social network data by grouping the person with some characteristics. It is very difficult to identify people on social networks especially for specific names in different cultures like India, and other ethnic places whose names are difficult to trace. This is a small effort to bridge the gap of identifying person with features like name, date ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005